Semantic Change Detection With Gaussian Word Embeddings

نویسندگان

چکیده

Diachronic study of the evolution languages is importance in natural language processing (NLP). Recent years have witnessed a surge computational approaches for detection and characterization lexical semantic change (LSC) due to availability diachronic corpora advancing word representation techniques. We propose Gaussian embedding (w2g)-based method present comprehensive LSC detection. W2g probabilistic distribution-based model represents words as mixture models using covariance information along with existing mean (word vector). also extensively several aspects w2g-based under SemEval-2020 Task 1 evaluation framework well Google N-gram corpus. In Sub-task (LSC binary classification) 1, we report highest overall ranking ranks two (German Swedish) four (English, Swedish, German Latin). Spearman correlation 2 ranking) Swedish. Our rankings classification sub-tasks are $^{\rm {st}}$ 7 {th}}$ , respectively. Qualitative analysis has been presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec...

متن کامل

AutoExtend: Combining Word Embeddings with Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings which incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an...

متن کامل

Adjusting Word Embeddings with Semantic Intensity Orders

Semantic lexicons such as WordNet and PPDB have been used to improve the vector-based semantic representations of words by adjusting the word vectors. However, such lexicons lack semantic intensity information, inhibiting adjustment of vector spaces to better represent semantic intensity scales. In this work, we adjust word vectors using the semantic intensity information in addition to synonym...

متن کامل

Gaussian Mixture Embeddings for Multiple Word Prototypes

Recently, word representation has been increasingly focused on for its excellent properties in representing the word semantics. Previous works mainly suffer from the problem of polysemy phenomenon. To address this problem, most of previous models represent words as multiple distributed vectors. However, it cannot reflect the rich relations between words by representing words as points in the em...

متن کامل

Gaussian LDA for Topic Models with Word Embeddings

Continuous space word embeddings learned from large, unstructured corpora have been shown to be effective at capturing semantic regularities in language. In this paper we replace LDA’s parameterization of “topics” as categorical distributions over opaque word types with multivariate Gaussian distributions on the embedding space. This encourages the model to group words that are a priori known t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3120645